202 research outputs found
On predictability of rare events leveraging social media: a machine learning perspective
Information extracted from social media streams has been leveraged to
forecast the outcome of a large number of real-world events, from political
elections to stock market fluctuations. An increasing amount of studies
demonstrates how the analysis of social media conversations provides cheap
access to the wisdom of the crowd. However, extents and contexts in which such
forecasting power can be effectively leveraged are still unverified at least in
a systematic way. It is also unclear how social-media-based predictions compare
to those based on alternative information sources. To address these issues,
here we develop a machine learning framework that leverages social media
streams to automatically identify and predict the outcomes of soccer matches.
We focus in particular on matches in which at least one of the possible
outcomes is deemed as highly unlikely by professional bookmakers. We argue that
sport events offer a systematic approach for testing the predictive power of
social media, and allow to compare such power against the rigorous baselines
set by external sources. Despite such strict baselines, our framework yields
above 8% marginal profit when used to inform simple betting strategies. The
system is based on real-time sentiment analysis and exploits data collected
immediately before the games, allowing for informed bets. We discuss the
rationale behind our approach, describe the learning framework, its prediction
performance and the return it provides as compared to a set of betting
strategies. To test our framework we use both historical Twitter data from the
2014 FIFA World Cup games, and real-time Twitter data collected by monitoring
the conversations about all soccer matches of four major European tournaments
(FA Premier League, Serie A, La Liga, and Bundesliga), and the 2014 UEFA
Champions League, during the period between Oct. 25th 2014 and Nov. 26th 2014.Comment: 10 pages, 10 tables, 8 figure
Real-time classification of malicious URLs on Twitter using Machine Activity Data
Massive online social networks with hundreds of millions of active users are increasingly being used by Cyber criminals to spread malicious software (malware) to exploit vulnerabilities on the machines of users for personal gain. Twitter is particularly susceptible to such activity as, with its 140 character limit, it is common for people to include URLs in their tweets to link to more detailed information, evidence, news reports and so on. URLs are often shortened so the endpoint is not obvious before a person clicks the link. Cyber criminals can exploit this to propagate malicious URLs on Twitter, for which the endpoint is a malicious server that performs unwanted actions on the person’s machine. This is known as a drive-by-download. In this paper we develop a machine classification system to distinguish between malicious and benign URLs within seconds of the URL being clicked (i.e. ‘real-time’). We train the classifier using machine activity logs created while interacting with URLs extracted from Twitter data collected during a large global event – the Superbowl – and test it using data from another large sporting event – the Cricket World Cup. The results show that machine activity logs produce precision performances of up to 0.975 on training data from the first event and 0.747 on a test data from a second event. Furthermore, we examine the properties of the learned model to explain the relationship between machine activity and malicious software behaviour, and build a learning curve for the classifier to illustrate that very small samples of training data can be used with only a small detriment to performance
Empirical competence-testing: A psychometric examination of the German version of the Emotional Competence Inventory
The “Emotional Competence Inventory“ (ECI 2.0) by Goleman and Boyatzis assesses emotional intelligence (EI) in organizational context by means of 72 items in 4 clusters (self-awareness, self- management, social awareness, social skills) which at large consist of 18 competencies. Our study examines the psychometric properties of the first German translation of this instrument in two different surveys (N = 236). If all items are included in reliability analysis the ECI is reliable (Cronbach’s Alpha = .90), whereas the reliability of the four sub dimensions is much smaller (Alpha = .62 - .81). For 43 items the corrected item-total correlation with its own scale is higher than correlations with the other three clusters. Convergent validity was examined by using another EI instrument (Wong & Law, 2002). We found a significant correlation between the two instruments (r = .41). The German version of the ECI seems to be quite useful, although the high reliability is achieved by a large number of items. Possibilities of improvement are discussed
Emotional Intelligence and its consequences for occupational and life satisfaction - Emotional Intelligence in the context of irrational beliefs
According to Albert Ellis' theory of Rational Emotive Behavior Therapy irrational beliefs (IB) lead to maladaptive emotions. A central component of irrationality is the denial of one's own possibilities to control important aspects of life. A specific IB is that one cannot control and thus cannot avoid certain emotion states. Emotion research considers regulative emotion control a pivotal component of the concept of emotional intelligence (EI). A negative association between IB and EI can thus be theoretically derived from both concepts. Furthermore both should be related to life satisfaction. We examined the relationship between IB and EI using standardized questionnaire instruments and the predictive value of both concepts regarding life satisfaction. We found a significant negative correlation between both conceptions (r = -.21). Life satisfaction and occupational satisfaction are better predicted by IB. R² increases from .04 to .12 when both concepts are incorporated in regression analysis
Beating the news using social media: the case study of American Idol
We present a contribution to the debate on the predictability of social events using big data analytics. We focus on the elimination of contestants in the American Idol TV shows as an example of a well defined electoral phenomenon that each week draws millions of votes in the USA. This event can be considered as basic test in a simplified environment to assess the predictive power of Twitter signals. We provide evidence that Twitter activity during the time span defined by the TV show airing and the voting period following it correlates with the contestants ranking and allows the anticipation of the voting outcome. Twitter data from the show and the voting period of the season finale have been analyzed to attempt the winner prediction ahead of the airing of the official result. We also show that the fraction of tweets that contain geolocation information allows us to map the fanbase of each contestant, both within the US and abroad, showing that strong regional polarizations occur. The geolocalized data are crucial for the correct prediction of the final outcome of the show, pointing out the importance of considering information beyond the aggregated Twitter signal. Although American Idol voting is just a minimal and simplified version of complex societal phenomena such as political elections, this work shows that the volume of information available in online systems permits the real time gathering of quantitative indicators that may be able to anticipate the future unfolding of opinion formation events
Trump vs. Hillary: What went Viral during the 2016 US Presidential Election
In this paper, we present quantitative and qualitative analysis of the top
retweeted tweets (viral tweets) pertaining to the US presidential elections
from September 1, 2016 to Election Day on November 8, 2016. For everyday, we
tagged the top 50 most retweeted tweets as supporting or attacking either
candidate or as neutral/irrelevant. Then we analyzed the tweets in each class
for: general trends and statistics; the most frequently used hashtags, terms,
and locations; the most retweeted accounts and tweets; and the most shared news
and links. In all we analyzed the 3,450 most viral tweets that grabbed the most
attention during the US election and were retweeted in total 26.3 million times
accounting over 40% of the total tweet volume pertaining to the US election in
the aforementioned period. Our analysis of the tweets highlights some of the
differences between the social media strategies of both candidates, the
penetration of their messages, and the potential effect of attacks on bothComment: Paper to appear in Springer SocInfo 201
A meta-analysis of state-of-the-art electoral prediction from Twitter data
Electoral prediction from Twitter data is an appealing research topic. It
seems relatively straightforward and the prevailing view is overly optimistic.
This is problematic because while simple approaches are assumed to be good
enough, core problems are not addressed. Thus, this paper aims to (1) provide a
balanced and critical review of the state of the art; (2) cast light on the
presume predictive power of Twitter data; and (3) depict a roadmap to push
forward the field. Hence, a scheme to characterize Twitter prediction methods
is proposed. It covers every aspect from data collection to performance
evaluation, through data processing and vote inference. Using that scheme,
prior research is analyzed and organized to explain the main approaches taken
up to date but also their weaknesses. This is the first meta-analysis of the
whole body of research regarding electoral prediction from Twitter data. It
reveals that its presumed predictive power regarding electoral prediction has
been rather exaggerated: although social media may provide a glimpse on
electoral outcomes current research does not provide strong evidence to support
it can replace traditional polls. Finally, future lines of research along with
a set of requirements they must fulfill are provided.Comment: 19 pages, 3 table
Collective emotions online and their influence on community life
E-communities, social groups interacting online, have recently become an
object of interdisciplinary research. As with face-to-face meetings, Internet
exchanges may not only include factual information but also emotional
information - how participants feel about the subject discussed or other group
members. Emotions are known to be important in affecting interaction partners
in offline communication in many ways. Could emotions in Internet exchanges
affect others and systematically influence quantitative and qualitative aspects
of the trajectory of e-communities? The development of automatic sentiment
analysis has made large scale emotion detection and analysis possible using
text messages collected from the web. It is not clear if emotions in
e-communities primarily derive from individual group members' personalities or
if they result from intra-group interactions, and whether they influence group
activities. We show the collective character of affective phenomena on a large
scale as observed in 4 million posts downloaded from Blogs, Digg and BBC
forums. To test whether the emotions of a community member may influence the
emotions of others, posts were grouped into clusters of messages with similar
emotional valences. The frequency of long clusters was much higher than it
would be if emotions occurred at random. Distributions for cluster lengths can
be explained by preferential processes because conditional probabilities for
consecutive messages grow as a power law with cluster length. For BBC forum
threads, average discussion lengths were higher for larger values of absolute
average emotional valence in the first ten comments and the average amount of
emotion in messages fell during discussions. Our results prove that collective
emotional states can be created and modulated via Internet communication and
that emotional expressiveness is the fuel that sustains some e-communities.Comment: 23 pages including Supporting Information, accepted to PLoS ON
SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods
In the last few years thousands of scientific papers have investigated
sentiment analysis, several startups that measure opinions on real data have
emerged and a number of innovative products related to this theme have been
developed. There are multiple methods for measuring sentiments, including
lexical-based and supervised machine learning methods. Despite the vast
interest on the theme and wide popularity of some methods, it is unclear which
one is better for identifying the polarity (i.e., positive or negative) of a
message. Accordingly, there is a strong need to conduct a thorough
apple-to-apple comparison of sentiment analysis methods, \textit{as they are
used in practice}, across multiple datasets originated from different data
sources. Such a comparison is key for understanding the potential limitations,
advantages, and disadvantages of popular methods. This article aims at filling
this gap by presenting a benchmark comparison of twenty-four popular sentiment
analysis methods (which we call the state-of-the-practice methods). Our
evaluation is based on a benchmark of eighteen labeled datasets, covering
messages posted on social networks, movie and product reviews, as well as
opinions and comments in news articles. Our results highlight the extent to
which the prediction performance of these methods varies considerably across
datasets. Aiming at boosting the development of this research area, we open the
methods' codes and datasets used in this article, deploying them in a benchmark
system, which provides an open API for accessing and comparing sentence-level
sentiment analysis methods
- …